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Abstract 


The  first  generation  of  expert  systems  (e.g., MYCIN, DENDRAL  ,Rl  )  is  often  characterized  as 
only  using  shallow  methods  of  representation  and  inference,  such  as  the  use  of  production 
rules  to  encode  empirical  knowledge.  First-generation  expert  systems  are  often  dismissed  on 
the  grounds  that  shallow  methods  have  inherent  and  fatal  shortcomings  which  prevent  them 
from  achieving  problem-solving  behaviors  that  expert  systems  should  possess.  Examples  of 
such  desirable  behaviors  include  graceful  performance  degradation,  the  handling  of  novel 
problems,  and  the  ability  of  the  expert  system  to  detect  its  problem-solving  bmits. 

This  paper  analyzes  the  relationship  between  the  techniques  used  to  build  expert 
systems  and  the  behaviors  they  exhibit  to  show  that  there  is  not  sufficient  evidence  to  link 
the  behavioral  shortcomings  of  first-generation  expert  systems  to  the  shallow  methods  of 
representation  and  inference  they  employ.  There  is  only  evidence  that  the  shortcomings 
are  a  consequence  of  a  general  lack  of  knowledge.  Moreover,  the  paper  shows  that  the 
first-generation  of  expert  systems  employ  both  shallow  methods  and  most  of  the  so-called 
deep  methods.  Lastly,  we  show  that  deeper  methods  augment  but  do  not  replace  shallow 
reasoning  methods;  most  expert  systems  should  possess  both. 

Keywords 
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function  knowledge,  first-principles  knowledge,  causal  knowledge,  production  rules,  multi- 
purpose problem  solving,  multi-level  domain  models,  brittleness,  declarative  knowledge  rep- 
resentation, performance  degradation,  novel  problem  solving,  compilation. 


1      Introduction 


The  distinction  between  deep  and  shallow  expert  systems  arose  in  the  early  1980s  from  a 
simple  intuition:  that  the  expert  systems  which  had  been  built  up  until  that  time  had  serious 
limitations.  Previous  writings  on  this  topic  discussed  these  limitations  in  terms  of  behaviors 
that  expert  systems  were  unable  to  achieve,  e.g.,  (Davis,  1984;  Genesereth,  1984;  Hart,  1982; 
Forbus,  1988).  These  analyses  then  took  the  logical  step  of  examining  the  techniques  that 
were  used  to  construct  these  systems  under  the  assumption  that  the  behavioral  limitations 
of  these  programs  could  be  traced  to  these  techniques.  New  techniques  and  research  goals 
were  proposed  that  were  directed  towards  achieving  the  desired  behaviors. 

This  paper  shows  that  there  are  three  problems  with  previous  analyses.  First,  some 
behaviors  are  so  poorly  defined  that  it  is  impossible  to  determine  whether  a  given  program 
exhibits  them  or  not.  Second,  some  arguments  as  to  how  the  potential  behaviors  of  these 
programs  are  limited  by  the  techniques  used  to  build  them  are  not  convincing.  And  lastly, 
the  new  techniques  proposed  to  overcome  these  limitations  often  bear  a  striking  resemblance 
to  existing  techniques  that  were  used  in  the  first  generation  of  expert  systems. 

A  major  thesis  of  this  paper  is  that  the  deep/shallow  distinction  has  its  origins  in,  and 
can  be  distilled  down  to,  one  central  hypothesis:  that  to  attain  more  sophisticated  behaviors, 
the  next  generation  of  "deep"  expert  systems  will  need  to  bring  much  more  knowledge  to 
bear  on  the  problem-solving  task  than  the  first  generation  of  "shallow"  expert  systems.  One 
might  view  this  claim  as  trivial  since  the  conjecture  that  "in  the  knowledge  lies  the  power" 
has  existed  for  many  years  (Feigenbaum,  1977).  Unfortunately,  this  conjecture  makes  the 
job  of  constructing  more  powerful  programs  sound  too  easy  since  it  says  that  to  do  so  we 
merely  give  them  more  knowledge.  It  does  not  tell  us  what  knowledge  to  include,  how  to 
structure  it,  nor  how  to  reason  with  it  to  solve  different  problems  efficiently. 

The  remainder  of  this  paper  is  structured  as  follows.  Section  2  outlines  behaviors 
that  knowledge  based  systems  should  exhibit,  such  as  graceful  degradation  near  the  limits 
of  problem  solving  and  the  ability  to  solve  novel  problems.  Section  3  analyzes  techniques 
that  appear  relevant  to  obtaining  these  behaviors,  such  as  use  of  first-principles  knowledge 
and  knowledge  compilation. 


2      Behavioral  Goals  for  Expert  Systems 


Expert  systems  are  programs  that  solve  problems  using  a  large  amount  of  domain-specific 
knowledge,  usually  in  a  domain  requiring  human  expertise,  such  as  medical  diagnosis.  The 
external  observable  characteristics  that  expert  systems  should  possess  can  be  described  in 
terms  of  behavioral  goals.  In  this  section,  behavioral  goals  are  defined,  and  the  successes 
and  failures  of  first-generation  expert  systems  to  achieve  these  goals  are  described. 

The  determination  that  a  given  behavioral  goal  cannot  be  attained  using  existing 
techniques  is  difficult.  Theoretical  approaches  are  too  weak  to  offer  definitive  answers,  so 
there  is  a  need  to  rely  on  experimental  evidence.  Strictly  speaking,  conclusive  experimental 
evidence  can  only  be  of  a  positive  sort.  An  empirical  approach  can  only  definitively  show 
that  it  is  possible  to  build  a  given  type  of  program,  by  presenting  an  example  of  such  a 
program.  But  such  an  approach  cannot  show  that  is  is  impossible  to  build  a  given  type  of 
program.  Negative  evidence  resulting  from  the  failure  to  build  a  certain  sort  of  program 
can  never  be  definitive  since  it  can  always  be  argued  that  the  experimenters  lacked  the 
skill,  commitment,  or  resources  to  achieve  the  behavioral  goal.  In  practice,  of  course,  as 
such  negative  evidence  accumulates  it  is  assigned  more  and  more  credibifity.  But  negative 
examples  are  extremely  rare  in  the  field  of  expert  systems  at  present.  There  must  be 
a  general  skepticism  regarding  claims  about  the  limitations  of  existing  techniques,  given 
these  methodological  considerations. 


2.1      Problem-Solving  Explanations 

A  distinguishing  feature  of  some  of  the  earliest  expert  systems,  such  asMYCIN,  was  the  abil- 
ity to  provide  'what'  and  'how'  explanations  of  problem-solving  behavior  (Buchanan  and 
Shortliffe,  1984;  Clancey,  1983).  Such  a  capability  is  an  immediate  consequence  of  the  rep- 
resentation of  expertise  using  production  rules,  and  the  use  of  production  rules  as  the  major 
method  of  inference.  An  explanation  of  a  problem- solving  action  is  obtained  by  unwinding 
the  instantiated  production  rules  associated  with  a  problem-solving  action.  The  NEOMY- 
CIN program  builds  upon  the  MYCIN  framework  and  encodes  most  of  the  problem-solving 
strategy  for  the  heuristic  classification  problem- solving  method  in  meta-level  production 
rules  (Clancey,  1984).  This  enables  NEOMYCIN  to  provide  explanations  of  problem-solving 
strategy  as  well  as  explanations  of  domain-level  problem  solving  actions. 

The  behavioral  goal  of  providing  straightforward  problem-solving  explanations  has 


been  demonstrated  by  existing  expert  systems,  such  as  MYCIN.  Indeed,  the  ability  to  pro- 
vide explanations  is  viewed  as  one  of  the  denning  traits  of  an  expert  system.  While  this 
behavioral  goal  has  been  largely  met,  future  research  in  this  area  faces  two  major  challenges. 

The  first  major  challenge  relates  to  tailoring  explanations  to  individual  users,  as  is 
routinely  done  by  human  experts.  To  achieve  this,  more  sophisticated  explanation  programs 
for  expert  systems  will  need  to  incorporate  a  user  model  that  can  tailor  an  explanation  to  the 
user.  For  example,  a  different  explanation  of  the  mechanism  behind  a  liver  disorder  should 
be  given  to  a  high  school  student,  a  medical  student,  and  a  physician.  More  sophisticated 
explanation  programs  will  also  be  able  to  give  the  same  explanation  at  different  levels  of 
detail. 

The  use  of  more  complex  and  deep  methods  of  problem  solving  will  provide  a  challenge 
for  explanation  programs.  Inference  will  occur  in  a  variety  of  ways,  and  hence  the  inferences 
connected  with  a  given  problem-solving  action  will  not  be  limited  to  chaining  of  production 
rules.  More  information  will  be  involved  in  any  one  problem-solving  action,  and  hence  there 
will  be  a  need  to  abstract  the  essentials  of  an  explanation  from  the  myriad  problem- solving 
details.  TheABEL  program  is  a  medical  diagnosis  program  that  provides  causal  explanations 
at  various  levels  of  detail;  each  level  gives  a  coherent  account  of  the  patient's  case  (Patil 
et  al.,  1981). 


2.2      Graceful  Performance  Degradation 

Experts  usually  have  a  narrow  range  of  expertise.  Yet,  an  expert's  problem-solving  abili- 
ties will  usually  degrade  gracefully  near  the  limits  of  the  expertise,  such  as  on  peripheral 
problems.  Also,  an  expert  is  usually  aware  when  a  problem  is  near  or  beyond  his  or  her 
expertise  -  this  is  termed  limit  detection.  Expert  systems  should  exhibit  such  behavior.  It 
is  claimed  that  expert  systems  cannot  degrade  gracefully,  that  they  are  brittle  (Lenat  and 
et  al,  1986;  Holland,  1986). 

A  major  reason  for  the  brittleness  of  expert  systems  is  simply  that  they  are  computer 
programs,  and  programs  are  brittle.  Programs  generally  can't  handle  cases  for  which  they 
have  not  been  programmed.  This  problem  is  usually  alleviated  in  conventional  computer 
programming  by  devoting  the  majority  of  the  programming  effort  and  the  majority  of  the 
code  to  handling  exceptions  and  non-standard  input.  Because  expert  systems  have  been 
principally  constructed  in  research  labs,  this  common  practice  has  not  been  followed.  An- 
other common  method  used  in  conventional  programs  is  to  have  an  error  handling  routine. 


An  error  check  is  generated  if  the  program  encounters  something  unexpected,  such  as  di- 
vide by  0  or  a  type  check,  and  control  is  passed  to  an  error  handling  routine.  Considering 
that  the  most  basic  programming  practices  for  making  programs  less  brittle  have  not  been 
followed,  it  is  not  surprising  that,  in  practice,  expert  systems  exhibit  brittleness. 

There  has  been  almost  no  research  on  understanding  the  phenomena  of  brittleness 
in  expert  systems,  such  as  measuring  how  rapidly  a  system's  performance  degrades  as 
problems  become  more  and  more  peripheral,  nor  research  to  determine  how  extensive  a 
task  it  would  be  to  encode  the  knowledge  required  for  peripheral  problems.  There  have 
not  been  attempts  to  apply  relatively  straightforward  solution  approaches  to  this  problem, 
such  as  those  methods  described  above  that  are  used  in  the  construction  of  conventional 
computer  programs.  The  only  major  effort  is  theCYC  project  for  encoding  a  large  amount  of 
common  sense  knowledge  (Lenat  and  et  al,  1986).  But  this  is  a  long-term  effort  and  the  way 
an  actual  expert  system  would  use  such  knowledge  is  beyond  the  scope  of  theCYC  project. 
There  have  not  been  any  studies  in  understanding  the  difficulty  in  adding  new  knowledge 
when  an  expert  system's  performance  exhibits  brittleness  on  a  peripheral  problem.  Indeed, 
it  would  seem  that  performance  degradation  would  provide  a  powerful  basis  for  automated 
knowledge  acquisition. 

While  it  is  generally  not  recognized,  some  of  the  earliest  expert  systems  possessed 
some  form  of  limit  detection.  For  example,  theMYCIN  program  informs  its  user  if  it  believes 
a  patient's  problem  is  outside  its  scope  of  expertise.  It  is  not  clear  how  difficult  it  is  to  build 
expert  systems  that  have  sophisticated  knowledge  of  their  own  limitations;  we  know  of  no 
negative  evidence  in  the  literature  of  attempts  to  build  such  systems.  It  might  be  possible  to 
make  limit  detection  very  sophisticated  by  constructing  a  companion  expert  system  whose 
purpose  is  to  evaluate  whether  or  not  a  given  problem  is  within  another  expert  system's 
range  of  expertise;  no  one  has  attempted  this. 

Thus,  this  behavioral  requirement  is  an  important  but  rather  nebulous  one  about 
which  AI  has  little  hard  evidence.  It  is  difficult  to  evaluate  the  degree  to  which  existing 
systems  suffer  from  these  limitations  nor  the  effort  required  to  satisfy  this  requirement. 


2.3      Problem-Solving  Speed 

Most  problem-solving  is  resource-limited:  a  solution  must  be  obtained  within  a  given  period 
of  time  or  using  less  than  a  certain  amount  of  computational  resources.  Human  expertise 
is  nicely  tailored  to  satisfy  resource  limitations.    For  example,  based  on  very  sparse  in- 


formation,  a  physician  has  to  search  a  very  large  space  of  potential  1  diagnoses  in  a  very 
short  amount  of  time.  A  physician  is  able  to  achieve  this  by  storing  a  very  large  amount  of 
knowledge  and  by  the  use  of  a  highly- compiled  form  of  diagnostic  expertise. 

Most  expert  systems  are  highly  knowledge-intensive  and  perform  only  a  limited 
amount  of  inference.  Yet,  most  expert  systems  research  from  MYCIN  onwards  has  been 
conducted  at  the  limits  of  the  available  computational  speed  and  memory  resources.  Pro- 
grams such  as  theTEIRESIAS  program  for  interactive  knowledge  acquisition  had  to  be  run 
in  separate  stages  because  of  memory  limitations  (Davis,  1982).  Despite  impressive  strides 
in  AI  hardware  over  the  last  decade,  large  expert-system  building  tools  such  asKEE  and 
Knowledge  Craft  are  often  still  too  slow  on  large  problems  to  be  very  effectively  used.  This 
situation  may  be  brought  under  control  over  the  next  few  years  by  the  introduction  of  more 
powerful  AI  workstations. 

Currently,  the  use  of  more  complete  reasoning  techniques,  such  as  reasoning  with  a 
complete  schematic  of  a  circuit  to  diagnosis  the  circuit,  is  almost  always  several  orders  of 
magnitude  slower  than  diagnosis  using  compiled  expertise  within  the  same  problem  domain. 
This  is  especially  true  for  large  problems. 


2.4      Novel  Problem  Solving 

It  has  been  argued  that  it  is  desirable  for  an  expert  system  to  be  able  to  solve  "novel" 
problems,  and  that  existing  expert  systems  are  unable  to  do  so.  The  notion  of  novelty  is 
another  difficult  behavioral  requirement  to  define  precisely.  Davis  (Davis,  1984)  appears  to 
view  novel  problems  as  those  which  the  designer  of  the  expert  system  did  not  anticipate 
while  building  the  system.  For  example,  since  a  programmer  must  anticipate  every  disease 
MYCIN  diagnoses  by  writing  one  or  more  explicit  production  rules,  MYCIN  is  not  capable 
of  diagnosing  a  novel  disease.  The  method  by  which  MYCIN  accomplishes  diagnosis  may 
however  appear  novel  to  a  physician  that  is  observing  the  expert  system's  behavior. 

The  ability  to  solve  novel  problems  is  sufficiently  imprecise  that  depending  upon 
one's  interpretation  it  is  either  impossible  to  achieve,  or  is  routinely  achieved  by  existing 
programs.  It  can  be  understood  more  clearly  by  considering  the  distinction  between  expert 
systems  that  use  generative  candidate  descriptions,  and  those  that  use  enumerative  candi- 
date descriptions.  This  distinction  refers  to  whether  an  expert  system  derives  the  cases  it 
reasons  about  from  a  pre-enumerated  set,  or  whether  it  generates  these  cases  dynamically. 


For  example,  consider  diagnostic  problem-solving.  One  way  to  view  the  task  that  a 
diagnostic  system  confronts  is  as  follows.  Given  information  about  the  expected  structure 
of  a  device,  plus  a  description  of  its  actual  behavior,  the  task  is  to  find  the  actual  (mal- 
functioning) structure  of  the  device.  Usually  the  actual  structure  includes  a  malfunctioning 
component,  but  it  could  include  extra  components,  such  as  solder  bridges  in  circuits.  A  gen- 
eral approach  to  this  problem  is  to  consider  a  set  of  candidate  device  structures  -  the  case 
descriptions  -  and  determine  which  candidate's  predicted  behavior  matches  the  observed 
behavior  of  the  actual  device  most  closely  (Reiter,  1987;  DeKleer  and  Williams,  1987). 

These  candidates  can  have  two  possible  origins.  Candidate  structures  and  their  asso- 
ciated behaviors  can  be  retrieved  from  pre-enumerated  classes  that  the  program  has  stored, 
or  they  can  be  generated  by  the  program.  Usually  the  generative  approach  breaks  the 
description  of  a  complex  device  into  many  components,  uses  a  set  of  pre-defined  operators 
to  introduce  defects  into  selected  components,  and  then  computes  the  behavior  of  the  new 
aggregate  device.  The  enumerative  approach  stores  all  different  possible  malfunctioning 
structures  and  their  associated  behaviors. 

MYCIN  uses  the  enumerative  approach.  It  has  a  list  of  possible  disease  states  (struc- 
tures) and  their  associated  symptoms  (behaviors),  and  its  rules  match  these  behaviors 
against  cases  it  is  presented  with  to  determine  a  diagnosis.  DART  (Genesereth,  1984)  and 
Davis'  system  (Davis,  1984)  use  the  second  approach.  They  start  with  one  model  of  the 
structure  and  behavior  of  a  computer  system  and  generate  a  potentially  large  set  of  other 
device  models  from  this  prototype,  which  are  matched  against  the  case  at  hand.  The  gen- 
eration process  is  guided  by  the  structure  of  the  device.  DART  does  the  generation  by  trying 
to  prove  that  each  component  of  a  device  is  broken,  e.g.,  that  a  component  it  has  been 
told  is  an  AND  gate  is  in  fact  not  behaving  like  an  AND  gate.  Davis'  system  uses  a  similar 
technique  called  constraint  suspension  to  generate  candidate  devices.  Candidates  are  gen- 
erated by  alternately  suspending  application  of  the  constraints  that  describe  the  behaviors 
of  different  components  of  a  device,  and  then  simulating  the  outputs  of  that  component  by 
copying  them  from  the  empirically  observed  outputs  of  the  component.  Thus  the  behavior 
of  the  device  model  is  coerced  to  match  the  behavior  of  the  actual  device. 

The  decision  as  to  whether  to  employ  generative  or  enumerative  descriptions  in  a  given 
expert  system  is  based  on  two  considerations.  First  is  the  classic  trade-off  between  storage 
space  and  computation  time.  For  many  problems  it  is  not  feasible  to  store  all  the  relevant 
candidates  explicitly,  such  as  all  possible  malfunctioning  structures  of  a  computer  system, 
even  under  the  single  fault  assumption.  Second,  a  generation  algorithm  must  exist.  For  some 
problems  there  may  be  no  generation  algorithm  which  is  sufficiently  fast  and  sufficiently 
constrained.  For  example,  it  is  not  possible  to  generate  all  the  ways  in  which  a  human  body 
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and  all  known  infectious  bacteria  could  interact  to  produce  observable  symptoms.  The 
theory  of  how  this  interaction  occurs  is  incomplete:  biological  science  cannot  provide  us 
with  the  operators  needed  to  construct  the  space  of  "candidate  device  descriptions".  Thus, 
MYCIN  contains  a  list  of  those  disease  states  that  medical  science  has  encountered. 

Thus  far  we  have  distinguished  the  notions  of  stored  versus  generated  candidates. 
It  is  also  instructive  to  blur  this  distinction  to  consider  Davis'  hypothesis  that  "reasoning 
from  first  principles  offers  the  possibility  of  dealing  with  novel  faults.  As  we  have  seen,  our 
system  does  not  depend  for  its  performance  on  a  catalog  of  observed  error  manifestations" 
(Davis,  1984).  Davis  is  essentially  contrasting  generative  with  enumerative  systems.  This 
hypothesis  is  hard  to  evaluate  because  the  notion  of  novelty  is  a  hard  one  to  pin  down  in 
AI  programs.  Davis  seems  to  define  novelty  in  terms  of  programmer  forethought,  i.e.,  novel 
situations  are  those  which  the  programmer  has  not  considered  beforehand.  Thus  Davis 
claims  that  in  the  enumerative  framework  the  author  of  a  program  must  carefully  consider 
every  possible  case  the  program  will  encounter  and  encode  the  solution  for  that  case.  If 
the  program  encounters  a  "novel"  case  that  its  author  did  not  think  of,  the  program  will 
likely  fail.  Under  the  generative  framework,  Davis  claims  that  the  program  will  be  able  to 
generate  any  conceivable  fault,  and  thus  cases  that  would  be  novel  to  the  program's  author 
will  not  be  novel  to  the  program. 

This  novelty  hypothesis  is  not  convincing.  Under  both  frameworks  the  author  of 
the  program  must  think  about  what  classes  of  cases  the  program  will  encounter.  When 
Davis'  circuit  diagnosis  program  was  constructed  its  authors  must  have  considered  what 
classes  of  faults  are  usually  observed  in  digital  circuits  so  they  would  know  what  classes  of 
fault-generation  operators  to  catalog  within  the  program.  And  under  both  frameworks  cases 
may  be  described  sufficiently  generally  that  they  match  unanticipated  problems  and  produce 
either  correct  or  incorrect  solutions.  MYCIN'S  enumerative  framework  clearly  does  not  require 
the  author  to  think  about  every  distinct  patient  whose  symptoms  will  be  described  toMYCIN , 
but  only  about  classes  of  patients.  Both  methods  allow  the  program  to  consider  clusters 
of  problem  cases  which  the  programmer  must  define  ahead  of  time.  And  the  programmer 
must  consider  these  cases  if  the  program  is  to  solve  them  correctly  for  reasons  other  than 
luck. 

The  decision  to  use  generative  versus  enumerative  descriptions  is  an  engineering  de- 
cision that  depends  upon  such  issues  as  the  existence  of  a  tractable  generation  algorithm, 
the  computational  complexity  of  this  algorithm,  and  the  size  of  the  case  space.  There  will 
be  times  when  it  is  easier  for  the  programmer  to  list  a  set  of  generation  operators  and  rules 
for  combining  them,  and  times  when  it  will  be  easier  to  enumerate  the  classes  of  cases.  The 
generative  approach  is  well  suited  to  the  domain  of  the  systems  of  Davis  and  Genesereth  for 


circuit  diagnosis;  the  generation  algorithm  is  simple  and  complete,  although  very  compu- 
tationally expensive.  The  enumerative  approach  is  well  suited  toMYCIN's  domain  because 
no  generation  algorithm  exists. 


2.5      Multi-Purpose  Problem  Solving 

The  possession  of  expertise  allows  an  expert  to  do  much  more  than  just  get  the  right  answer 
to  a  problem.  In  addition,  experts  often  have  the  capability  to  explain  the  reasons  for  their 
problem  solving  actions,  teach  domain  knowledge  and  problem  solving  skills  to  a  novice 
or  another  expert,  explain  the  observed  problem  solving  behavior  of  a  novice  or  another 
expert,  and  realize  when  their  problem  solving  knowledge  is  inadequate  to  solve  a  particular 
problem.  Lastly,  when  at  an  impasse,  an  expert  can  often  elicit  from  another  expert  the 
exact  knowledge  necessary  to  solve  the  problem.  Expert  systems  should  be  able  to  exhibit 
such  multi-purpose  problem  solving  in  their  domain  of  expertise.  It  would  be  especially 
convenient  if  one  knowledge  base  for  a  domain  could  support  these  diverse  dimensions  of 
expertise.  This  is  an  important  design  goal  for  an  expert  system. 

The  achievement  of  multi-purpose  problem  solving  appears  to  be  highly  dependent 
on  making  the  knowledge  in  the  expert  system  as  modular,  explicit,  and  declarative  as 
possible  (Buchanan  and  Shortliffe,  1984).  Consequently,  developing  methods  of  knowledge 
representation  that  emphasize  these  characteristics  has  been  a  driving  force  in  expert  sys- 
tems research  from  its  earliest  days.  The  method  of  knowledge  representation  used  by 
MYCIN  was  certainly  a  large  step  in  the  direction  of  explicitly  specifying  domain  knowledge 
in  a  modular  fashion.  Knowledge  was  chunked  into  production  rules  that  were  individu- 
ally comprehensible  to  domain  experts.  This  representation  was  able  to  support  problem 
solving  and  the  generation  of  "how"  and  "why"  explanations.  TheTEIRESIAS  (Davis,  1982) 
andGUIDON  (Clancey,  1979)  showed  thatMYCIN's  representation  provided  the  basis  for  pro- 
grams to  accomplish  to  varying  degrees,  interactive  knowledge  acquisition  and  intelligent 
tutoring,  respectively. 

The  NEOMYCIN  program  was  a  reconstruction  of  MYCIN  that  made  the  method  of 
knowledge  representation  and  inference  even  more  declarative  and  explicit,  especially  with 
respect  to  knowledge  of  strategy.  This  allowed  theNEOMYCIN  program  to  provide  strategic 
and  as  domain-level  explanations,  and  further  facilitated  the  use  of  the  same  knowledge  for 
multiple  purposes.  For  example,  the  GUIDON- WATCH  and  GUIDON-MANAGE  programs  pro- 
vide ways  of  teaching  the  knowledge  in  aNEOMYCIN  knowledge  base  to  a  student  (Clancey, 
1986).  And  the  ODYSSEUS  program  shows  how  the  method  of  knowledge  representation  of 
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NEOMYCIN  provides  the  basis  for  explaining  the  observed  actions  of  a  student  or  an  expert, 
and  also  the  basis  of  expanding  the  domain-level  knowledge  via  apprenticeship  learning 
(Wilkins,  1988b;  Wilkins,  1988a). 


2.6      Problem-Solving  Sophistication 

There  is  a  belief  that  "deep  systems  will  solve  problems  of  significantly  greater  complexity 
than  surface  systems  can"  (Chandrasekaran  and  Mittal,  1983).  One  might  support  this 
intuition  of  Chandrasekaran  by  noting  that  most  expert  systems  constructed  to  date  have 
contained  at  most  several  thousand  rules  and  are  capable  of  solving  at  most  a  tiny  subset 
of  the  problems  in  a  domain  such  as  medicine. 

However,  as  we  noted  at  the  beginning  of  Section  2,  such  a  lack  of  positive  evidence 
hardly  constitutes  convincing  negative  evidence.  One  could  just  as  easily  construe  existing 
systems  to  be  proof  that  these  techniques  are  valid  for  a  sizable  subset  of  a  domain  such 
as  internal  medicine  (Pople,  1982),  and  that  it  is  only  a  matter  of  time  before  knowledge 
bases  are  constructed  using  existing  techniques  that  span  all  of  medicine. 

AI  has  no  accepted  way  of  measuring  the  complexity  of  a  problem  domain  that  would 
enable  one  to  describe  the  types  of  problems  that  existing  techniques  are  able  to  solve,  and 
the  classes  of  unsolvable  problems.  One  way  to  measure  the  complexity  of  a  problem  domain 
is  by  measuring  the  complexity  of  a  program  that  covers  the  problem  domain.  However 
we  lack  a  principled  method  for  this.  Even  if  we  had  a  principal  method  it  would  only 
provide  an  upper  bound  and  this  assumes  that  it  can  be  proven  that  the  program  is  correct. 
Possible  measurements  suggested  by  Buchanan  for  measuring  the  complexity  of  an  expert 
system  are  knowledge  base  size,  solution  space  size,  and  average  inference  complexity  for 
different  expert  systems  (Buchanan,  1987).  The  expert  systems  that  Buchanan  discusses 
have  knowledge  bases  with  hundreds  of  concepts  in  their  vocabularies  and  contain  thousands 
of  rules.  They  solve  problems  with  millions  and  tens  of  millions  of  possible  solutions.  It  is  not 
clear  how  to  characterize  the  more  complex  problems  that  Chandrasekaran  (Chandrasekaran 
and  Mittal,  1983)  mentions,  nor  the  degree  to  which  existing  techniques  will  continue  to 
scale  as  faster  hardware  becomes  available. 
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3      Techniques  for  Building  Expert  Systems 


There  is  a  close  connection  between  the  techniques  used  to  construct  an  expert  system  and 
the  behavior  exhibited  by  the  program.  The  primary  goal  of  this  section  is  to  analyze  the 
dependencies  between  techniques  and  behaviors.  Particular  attention  is  given  to  previous 
arguments  in  the  literature  concerning  limitations  of  existing  techniques  and  proposals  for 
new  techniques  that  can  provide  the  basis  for  more  sophisticated  behaviors. 

The  techniques  that  will  be  discussed  fall  into  three  classes.  The  first  class  concerns 
the  type  of  knowledge  to  be  used,  such  as  causal,  empirical,  or  first-principles  knowledge. 
The  second  class  pertains  to  representing  and  reasoning  with  this  knowledge,  such  as  the 
use  of  production  rules  and  the  explicit  representation  of  control  knowledge.  Lastly,  we 
consider  the  amount  of  knowledge  a  system  contains.  In  what  follows,  each  technique  will 
be  described  in  detail  and  we  will  consider  how  its  use  should  contribute  to  the  satisfaction 
of  different  behavioral  requirements. 

Our  methodological  comments  at  the  beginning  of  Section  2  are  relevant  here.  It  is  not 
clear  why  previous  authors  often  call  for  the  complete  abandonment  of  existing  techniques 
when  little  negative  evidence  exists  to  substantiate  claims  that  a  given  behavior  cannot  be 
achieved  using  a  certain  technique.  The  desired  behaviors  may  require  only  an  improvement 
in  degree  of  the  capabilities  of  existing  systems. 


3.1      Causal  Knowledge 

There  is  a  commonly  held  belief  that  first-generation  expert  systems  did  not  use  causal 
knowledge,  and  that  such  knowledge  should  supplant  the  use  of  empirical  knowledge  in  ex- 
pert systems.  Exactly  what  constitutes  causal  knowledge  is  never  made  clear.  Just  because  a 
knowledge  base  contains  h'nks  labeled  "causes"  does  not  mean  the  program  employs  a  well 
developed  notion  of  causality.  Philosophers  have  studied  causality  for  hundreds  of  years 
without  synthesizing  a  particularly  coherent  understanding  of  this  complicated  concept.  In 
a  much  shorter  time  AI  has,  shall  we  say,  not  exceeded  a  proportionate  contribution.  We 
will  consider  philosophical  discussions  of  causality  and  then  examine  the  relation  between 
causal  knowledge  and  expert  systems. 

Nagel  lists  four  types  of  causal  explanation  in  science:  deductive  explanation,  prob- 
abilistic explanation,  teleological  explanation,  and  genetic  explanation  (Nagel,  1961)  .   He 
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argues  that  some  scientific  laws  have  a  causal  basis,  while  others,  such  as  the  Boyle-Charles 
Gas  Law,  "simply  asserts  a  certain  concomitance  in  the  variation  of  the  specified  attributes 
of  a  gas,  and  is  therefore  generally  regarded  as  making  no  causal  statement"  (p22).  He  also 
notes  that  completely  sufficient  conditions  for  the  occurrence  of  specific  events  are  rarely  if 
ever  known. 

In  contrast,  the  philosopher  Mackie  centers  his  discussion  of  causality  around  what 
he  terms  an    INUS  condition,  which  is  used  to  define  a  cause  of  an  event: 


A  is  an  INUS  condition  of  a  result  P  iff,  for  some  X  and  some  Y,  (AaX)vY 
is  a  necessary  and  sufficient  condition  of  P,  but  A  is  not  a  sufficient  condition 
of  P  and  X  is  not  a  sufficient  condition  of  P  (Mackie,  1965). 


Suppes  analyzes  five  different  conflicting  intuitions  about  causality  as  developed  by 
other  philosophers,  in  terms  of  his  own  probabilistic  theory  of  causality  (Suppes,  1984). 
His  theory  includes  the  notion  of  a  genuine  cause,  which  is  a  prima  facie  cause  that  is  not 
spurious.  The  terms  prima  facie  cause  and  spurious  cause  are  defined  as  follows. 


An  event  B  is  a  prima  facie  cause  of  an  event  A  if  and  only  if  (i)  B  occurs 
earlier  than  A,  and  (ii)  the  conditional  probability  of  A  occurring  when  B  occurs 
is  greater  than  the  unconditional  probability  of  A  occurring. 

An  event  B  is  a  spurious  cause  of  A  if  and  only  if  B  is  a  prima  facie  cause 
of  A,  and  there  is  a  partition  of  events  earlier  than  B  such  that  the  conditional 
probabihty  of  A,  given  B  and  any  element  of  the  partition,  is  the  same  as  the 
conditional  probability  of  A,  given  just  the  element  of  the  partition. 


A  comparison  the  definitions  made  by  these  philosophers  makes  it  clear  that  many 
reasonable  but  incompatible  notions  of  causality  exist.  It  is  thus  rather  confusing  to  read 
the  suggestions  below  that  future  expert  systems  should  include  causal  knowledge,  and  that 
existing  expert  systems  do  not  include  it,  without  any  explication  of  what  it  means  for  a 
machine  (or  anything  else)  to  have  causal  knowledge: 


There  are  domains  where  problem  solving  clearly  relies  on  more  than  com- 
piled experience.  Other  varieties  of  knowledge  are  involved,  knowledge  of  struc- 
ture and  causal  models  (Davis,  1982)  (p4). 
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Surface  systems  ...   have  no  underlying  representation  of  such  fundamental 
concepts  as  causality,  intent,  or  basic  physical  principles...  (Hart,  1982)  (pl2) 

Examination  of  the  philosophical  literature  reveals  that  causality  is  a  much  more 
complex  and  less  well  understood  concept  than  most  authors  acknowledge.  Thus,  when  AI 
authors  provide  few  clues  as  to  what  they  believe  causal  knowledge  is,  it  is  difficult  to  accept 
their  claims  that  existing  systems  lack  it,  or  that  future  systems  will  need  it.  In  addition, 
these  authors  never  make  clear  what  specific  behaviors  this  lack  of  causal  knowledge  prevents 
current  programs  from  attaining,  or  will  enable  programs  containing  causal  knowledge  to 
attain. 

What  substance  these  claims  do  have  stems  from  intuitive  notions  of  causality,  which 
we  will  use  as  the  basis  for  further  discussion.  Our  position  is  that  it  is  not  clear  whether 
existing  expert  systems  have  causal  knowledge  or  not.  The  discussion  below  suggests  that  if 
one  accepts  Suppes'  definition  of  causality,MYCIN  does  have  causal  knowledge.  An  examina- 
tion of  the  knowledge  in  theDENDRAL  andCASNET  programs  suggests  that  their  knowledge 
meets  intuitive  criteria  of  causality. 

Let  us  consider  theMYCIN  steroids  rule: 


If  1)  Infection  requiring  therapy  is  meningitis 

2)  Only  circumstantial  evidence  is   available 

3)  The  type   of   the   infection  is  bacterial 

4)  The  patient  is  receiving  corticosteroids 

Then     There   is  evidence  that  organisms  causing 

the   infection  are  klebsiella-pneumoniae   (.2), 
e.    coli   (.4),    or  pseudomonas-aeruginosa  (.1). 

This  rule  is  justified  by  the  underlying  knowledge  that  corticosteroids  impair  the 
body's  ability  to  control  organisms  that  normally  reside  within  the  body  (Clancey,  1983). 
That  is,  corticosteroids  cause  the  body  to  enter  a  state  in  which  certain  organisms  are 
likely  to  proliferate  to  an  abnormal  degree.  In  this  rule,  corticosteroids  fall  under  Suppes' 
definition  of  a  prima  facie  cause  of  the  infection  because  the  conditional  probability  of 
infection  by  these  organisms  is  higher  when  the  patient  is  receiving  corticosteroids  than 
when  they  are  not.  That  is,  the  rule  links  two  events,  A  (infection  by  certain  organisms), 
and  B  (administration  of  corticosteroids),  where  the  conditional  probability  of  A  occurring 
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when  B  occurs  is  greater  than  the  unconditional  probability  of  A  occurring.  Note  that 
this  definition  of  causality  admits  most  empirical  associations  as  causal  knowledge.  This 
position  has  been  rejected  by  most  previous  authors  in  AI.  Suppes'  position  may  or  may 
not  be  correct,  but  it  is  formulated  with  so  much  more  detail  and  precision  that  it  is  worthy 
of  serious  consideration  (which  is  beyond  the  scope  of  this  paper). 

It  is  also  instructive  to  perform  a  thought  experiment  in  which  we  transform  the 
above  rule  and  theory  into  these  hypothetical  rules: 

If  1)   The  patient  is  receiving  corticosteroids 

Then     There   is   evidence  that   the  patient's   immune 
system  is   suppressed. 


If  1)  Infection  requiring  therapy   is  meningitis 

2)  Only  circumstantial  evidence  is  available 

3)  The  type  of   the   infection  is  bacterial 

4)  The  patient's  immune  system  is   suppressed 

Then     There   is  evidence  that  organisms  causing 

the   infection  are  klebsiella-pneumoniae   (.2), 
e.    coli    (.4),      or  pseudomonas-aeruginosa  (.1). 

Here  we  have  givenMYCIN  more  knowledge  than  it  had  before,  and  apparently  knowl- 
edge of  a  causal  sort.  We  can  imagine  adding  more  and  more  rules  which  describe  what 
factors  suppress  the  body's  immune  system,  and  what  cellular  and  chemical  events  are  in- 
volved in  that  suppression.  As  MYCIN'S  knowledge  increases,  it  takes  on  a  more  and  more 
causal  character.  But  at  no  point  was  a  new  type  of  knowledge  added  toMYCIN;  we  merely 
added  more,  increasingly  detailed  knowledge  of  a  type  whichMYCIN  already  has. 

One  troubling  aspect  of  the  singleMYCIN  production  rule  above  is  that  we  know  this 
rule  is  incomplete:  MYCIN  does  not  possess  the  medical  knowledge  which  describes  why  a 
patient  who  receives  corticosteroids  is  likely  to  be  infected  by  the  named  organisms.  Thus 
we  feel  uncomfortable  sayingMYCIN  has  causal  knowledge  because  we  know  its  knowledge  is 
incomplete.  This  amounts  to  saying  that  a  program  only  has  causal  knowledge  if  it  knows 
all  that  experts  know  about  a  phenomenon.  This  is  an  unsatisfactory  definition  since  it 
has  the  property  that  a  program  will  cease  to  have  causal  knowledge  of  a  phenomenon  if 
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experts  acquire  more  knowledge  about  a  phenomenon.  This  should  not  change  the  type 
of  knowledge  the  program  has,  but  only  affects  how  correct  and  detailed  we  believe  the 
knowledge  to  be. 

Another  interesting  program  to  examine  theCASNET  program  developed  by  Weiss  et 
al  (Weiss  et  al.,  1978).CASNET  diagnoses  diseases  related  to  glaucoma.  To  do  so  it  employs 
three  different  planes  of  knowledge.  The  central  plane  is  a  causal  model  of  disease  processes 
in  this  domain,  in  the  form  of  a  graph.  Nodes  of  the  graph  represent  hypotheses  about 
the  disease  process,  and  edges  in  the  graph  represent  causal  connections  between  these 
hypotheses,  e.g.,  "Cupping  of  the  Optic  Disc  causes  Glaucomatous  Visual  Field  Loss". 

A  second  plane  of  knowledge  contains  clinical  observations,  which  are  connected  to 
nodes  in  the  causal  plane.  Thus  clinical  observations  yield  inferences  about  what  disease 
processes  are  occurring  in  the  patient.  The  third  plane  of  knowledge  specifies  what  disease 
processes  in  the  causal  plane  are  associated  with  what  disease  diagnoses.  Thus,  to  diagnose  a 
patient, CASNET  uses  clinical  observations  of  the  patient  to  determine  what  disease  processes 
are  occurring  in  the  patient,  and  then  finds  what  diagnoses  are  associated  with  these  disease 
processes. 

The  DENDRAL  program  also  uses  causal  knowledge.  Its  production  rules  describe 
the  causal  interactions  between  organic  molecules  and  a  mass  spectrograph.  These  rules 
describe  which  chemical  bond  cleavages  are  likely  to  be  caused  by  a  molecule's  passage 
through  a  mass  spectrograph. 

The  most  important  conclusion  of  this  section  is  that  the  term  "causal  knowledge" 
is  very  poorly  denned  within  Al.  In  addition,  a  brief  examination  of  several  first  generation 
expert  systems  suggests  that  they  do  contain  some  causal  knowledge.  Thus,  the  rather 
strong  claim  that  existing  expert  systems  contain  no  causal  knowledge,  but  that  it  is  essential 
to  add  this  new  sort  of  knowledge  to  future  expert  systems,  should  be  deflated  to  the  more 
reasonable  claim  that  future  expert  systems  will  require  more  causal  knowledge  than  existing 
systems  have. 


3.2      Empirical  Knowledge 

Genesereth  has  claimed  that  programs  such  as  MYCIN  and  INTERNIST 
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use  'shallow'  theories  of  human  pathophysiology  in  the  form  of  'rules'  that 
associate  symptoms  with  possible  diseases.  TheDART  program  contains  no  rules 
of  this  form.  Instead,  it  works  directly  from  a  'deep'  theory  consisting  of  infor- 
mation about  intended  structure  ...  and  expected  behavior"  (Genesereth,  1984) 
(p412). 


In  a  similar  vein,  Davis  has  claimed  that  when  applied  to  building  a  digital  circuit 
diagnostic  system,  the  traditional  shallow  approach  would  lead  to  the  creation  of  empirical 
associations  such  as  the  following,  where  B7  and  AF2  are  points  within  a  circuit  (Davis, 
1982): 


If  The   signal  is   OK  at   B7     and 

The   signal  is  blocked  at   AF2 
Then  The   signal   is   lost   somewhere  between  B7   and  AF2 


Davis  (Davis,  1982)  quite  correctly  concludes  that  a  better  approach  would  be  to 
write  "a  set  of  rules  that  tried  to  capture  just  the  signal  tracing  skill,  and  had  a  separate 
description  of  structure."  He  summarizes: 


The  point  is  simply  that  the  Accepted  Wisdom  focuses  on  the  use  of  rules 
embodying  empirical  associations.  It  does  not  offer  us  any  tools  for  constructing 
structural  descriptions  of  the  sort  we  need,  it  does  not  offer  us  any  techniques  for 
using  those  descriptions  to  guide  diagnosis,  and  perhaps  even  more  important, 
it  does  not  even  lead  us  to  think  in  such  terms  (Davis,  1982)  (pl8)  [his  italics]. 


And  along  these  same  lines,  Chandrasekaran  and  Mittal  say  that 


Surface  systems  are  at  best  a  data  base  of  pattern-decision  pairs,  with 
perhaps  a  simple  control  structure  to  navigate  through  the  data  base  (Chan- 
drasekaran and  Mittal,  1983). 


Davis  and  Genesereth  are  suggesting  that  we  abandon  the  use  of  "rules  embodying 
empirical  associations"  when  constructing  expert  systems.    This  suggestion  is  unclear  in 


17 


several  respects.  Is  it  the  method  of  knowledge  representation  and  inference,  the  empiri- 
cal associations,  or  both  that  are  bad?  What  desired  behaviors  do  they  prevent  us  from 
attaining,  and  why? 

It  is  unclear  exactly  what  is  an  "empirical  association".  Consider  an  inference  rule 
that  states  that  a  particular  consequent  should  be  inferred  from  a  particular  antecedent. 
Perhaps  this  rule  is  an  empirical  association  if  our  only  justification  for  writing  the  rule 
is  that  in  the  past  we  have  empirically  observed  this  antecedent  and  consequent  to  be 
associated  with  one  another.  A  program  which  only  uses  empirical  associations  is  one  in 
which  all  inferences  have  this  type  of  justification.  Apparently,  this  is  to  be  contrasted 
with  rules  with  some  other  type  of  justification,  e.g.,  if  a  scientific  theory  predicts  that  the 
antecedent  will  necessarily  cause  the  consequent  to  occur,  or  when  the  antecedent  implies 
the  consequent  by  definition. 

Using  this  definition,  the  quotes  above  are  not  accurate  since  many  ofMYCIN's  rules 
are  not  simply  empirical  associations,  but  have  some  justification  from  medical  science,  such 
as  the  steroids  rule  discussed  in  Section  3.1. 

Perhaps  then  these  authors  have  in  mind  a  slightly  different  definition,  namely  that  a 
program  uses  empirical  associations  if  the  program  itself  does  not  know  the  justifications  of 
its  inferences.  MYCIN  certainly  does  not  know  how  medical  science  justifies  the  steroids  rule. 
This  knowledge  might  be  useful  to  a  program  because  it  will  allow  the  program  to  check 
the  justification  of  a  rule  to  be  sure  that  the  rule  is  applicable  in  the  current  situation,  thus 
decreasing  brittleness.  An  approach  to  providing  explicit  justifications  for  heuristic  rules  is 
described  in  (Smith  et  al.,  1985). 

A  quest  to  produce  an  expert  system  free  of  empirical  associations  can  never  succeed 
since  the  program's  justification  structures  must  bottom  out  somewhere.  If  every  justifica- 
tion is  supported  by  another  justification  then  the  program  must  employ  either  an  infinite 
set  of  justifications  or  a  circular  set.  The  unjustified  rules  in  a  program  with  neither  of 
these  properties  must  be  empirical  associations.  Thus  every  finite  program  that  is  free  of 
circular  reasoning  must  rest  upon  empirical  associations.  The  only  alternative  is  resting  on 
unjustified  assumptions  or  postulates. 

That  empirical  associations  are  essential  should  not  be  surprising  since  all  of  our 
scientific  causal  knowledge  ultimately  rests  on  experimental  data  that  consists  of  empirical 
associations.  The  inductive  inference  from  which  scientific  theories  are  constructed  is  even 
less  trustworthy.  If  empirical  associations  are  the  basis  of  all  of  science,  perhaps  they  are 
not  so  bad  after  all. 
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One  way  to  act  upon  the  advice  above  to  avoid  using  empirical  associations  is  to 
provide  a  program  with  as  many  justifications  as  possible  for  the  rules  that  it  uses.  But  we 
must  recognize  that  these  justifications  will  ultimately  contain  many  empirical  associations. 


3.3      First-Principles  Knowledge 

It  has  been  suggested  that  existing  expert  systems  do  not  use  first  principles,  and  that  their 
use  will  be  important  to  future  expert  systems: 

Reasoning  from  First  Principles  offers  the  possibility  of  dealing  with  novel 
faults  (Davis,  1984). 

At  the  extremes,  a  surface  system  directly  associates  input  states  with  ac- 
tions, whereas  a  deep  system  makes  deductions  from  a  compact  collection  of 
fundamental  principles  (Hart,  1982)  (pi 2). 

The  major  intuition  behind  the  feeling  that  expert  systems  should  have  deep 
models  is  the  observation  that  often  even  human  experts  resort  to  first  principles 
when  confronted  with  an  especially  knotty  problem.  Also,  there  is  the  empirical 
observation  that  a  human  expert  who  cannot  explain  the  basis  for  his  reasoning 
by  appropriate  reference  to  the  deeper  principles  of  his  field  will  have  credibility 
problems...  (Chandrasekaran  and  Mittal,  1983). 

These  authors  seem  to  have  the  intuition  that  the  behavioral  goals  of  solving  novel 
problems  -  such  as  removing  brittleness  and  solving  more  complex  problems  -  can  be 
satisfied  by  giving  programs  knowledge  of  and  the  ability  to  reason  from  first  principles. 
The  key  characteristics  of  these  principles  appear  to  be  general  applicability  but  weak  ability 
to  aid  in  solving  specific  problems.  Generality  is  what  makes  them  superior  to  the  non-first 
principles  contained  by  the  program:  first  principles  are  sufficiently  general  to  apply  to 
some  of  the  novel,  difficult,  and  peripheral  problems  for  which  the  non-first  principles  fail. 
It  is  on  these  problems  that  the  system  "falls  back  on  first  principles". 

But  non-first  principles  must  have  advantages  over  first  principles  since  first  principles 
are  apparently  applied  after  the  former  have  failed,  not  in  place  of  the  former.  That  is, 
first  principles  are  tried  last,  non-first  principles  are  tried  first.    The  non-first  principles 
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could  have  several  advantages.  First  principles  might  be  slower  than  non-first  principles: 
a  very  general  problem  solving  method  is  expected  to  be  slower  than  methods  that  have 
been  tailored  to  particular  problems.  First  principles  could  be  more  difficult  to  apply  to 
a  specific  problem  -  they  may  be  difficult  to  operationalize.  Sometimes  it  may  not  be 
possible  to  operationalize  first  principles  at  all  -  they  might  not  apply  to  all  problems  in  a 
domain.  Finally,  first  principles  might  produce  incorrect  answers,  either  because  they  are 
overly  general  or  because  they  are  not  accurate. 

First  principles  might  differ  from  non-first  principles  in  terms  of  speed,  completeness, 
and  correctness.  Hart  (Hart,  1982)  has  the  intuition  that  first  principles  should  be  compact 
and  fundamental.  It  is  easy  to  think  of  domains  that  do  not  have  compact  first  principles. 
Medicine  has  a  huge,  incomplete,  inaccurate  set  of  first  principles  (much  of  biology),  educa- 
tion has  even  more  incomplete  and  inaccurate  first  principles,  quantum  mechanics  has  very 
complex  first  principles;  and  digital  circuit  diagnosis  appears  to  have  a  complete,  correct, 
and  small  set  of  first  principles. 

One  might  define  first  principles  as  the  lowest  level  of  knowledge  usually  employed  by  a 
professional  community,  i.e.,  the  lowest  level  of  knowledge  to  which  an  expert  must  usually 
revert  when  solving  problems  in  his  or  her  domain.  Thus,  first  principles  for  molecular 
biologists  are  the  laws  of  chemistry,  and  first  principles  for  teachers  include  theories  of  how 
children  learn. 

The  large  variation  in  the  form  of  first  principles  makes  them  extremely  difficult  to 
distinguish  from  non-first  principles.  We  have  characterized  first  principles  as  being  general 
and  some  combination  of  slow,  incomplete  and  incorrect.  All  these  adjectives  can  easily  be 
applied  to  the  principles  built  into  existing  expert  systems  -  the  latter  have  some  degree 
of  generality  and  certainly  have  limitations  of  speed,  completeness  and  correctness.  Given 
the  range  in  the  expected  properties  for  first  principles  in  the  domains  above  it  is  not  at  all 
clear  when  we  can  characterize  a  set  of  principles  as  first  principles  and  when  we  cannot. 

This  has  two  implications.  First,  if  first  principles  are  so  hard  to  distinguish  from  the 
principles  in  existing  expert  systems,  it  is  not  obvious  why  the  techniques  used  to  encode 
these  existing  principles  (e.g.,  production  rules)  would  not  suffice  to  encode  first  principles. 

Second,  it  would  appear  that  we  can  operationalize  the  advice  to  "use  first  principles 
to  build  deep  systems"  into  the  advice  to  construct  an  expert  system  that  can  attempt  to 
apply  more  than  one  problem  solving  method  to  a  given  problem.  The  fastest,  most  specific, 
most  complete,  most  correct  method  is  attempted  first.  Thus,  the  meaning  of  "fall  back 
on  first  principles"  is  simply  to  bring  an  additional  knowledge  to  bear  on  a  problem,  not  a 
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fundamentally  new  kind  of  method.  Attacking  a  problem  with  multiple  knowledge  sources 
will  improve  a  problem  solver's  behavior  by  contributing  to  the  solution  of  novel,  difficult, 
and  peripheral  problems,  appears  to  be  excellent  advice,  that  in  essence  instructs  us  to 
build  This  appears  to  be  excellent  advice,  that  in  essence  instructs  us  to  build  programs 
with  more  knowledge.  Notice  that  this  differs  from  the  original  interpretation  of  this  advice, 
which  viewed  first  principles  as  a  new,  special  kind  of  knowledge  which  existing  programs 
do  not  have. 


3.4      Structural  and  Functional  Knowledge 

It  has  been  argued  that  future  expert  systems  will  require  knowledge  of  the  structure  and 
function  of  physical  systems,  and  that  this  is  a  type  of  knowledge  that  the  first-generation 
expert  systems  did  not  have  or  use.  The  quotations  by  Davis  and  Genesereth  in  Section 
3.1  are  examples  of  such  statements.  This  case  is  very  similar  to  that  of  causal  knowledge: 
we  assert  that  existing  expert  systems  do  in  fact  contain  some  structural  and  functional 
knowledge,  but  their  performance  will  probably  be  improved  if  they  are  given  more  of  this 
type  of  knowledge. 

TheDENDRAL  program  (Lindsay  et  al.,  1980)  has  explicit  knowledge  concerning  the 
allowable  chemical  structures  that  complex  organic  molecules  can  assume,  and  can  gener- 
ate all  the  potential  chemical  structures  that  can  potentially  be  formed  when  a  complex 
molecule  is  heated  within  a  mass  spectrograph.  The  operation  of  theDENDRAL  program 
can  be  viewed  as  follows.  DENDRAL  starts  with  a  deep  first-principles  or  structure-function 
model  of  chemistry,  calledCONGEN.  Given  this  deep  theoretical  model,  and  given  empirical 
knowledge  of  the  behavior  or  molecules  in  a  mass  spectrogram  as  a  set  of  classified  training 
instances,  DENDRAL  synthesizes  a  set  of  shallow  production  rules  that  can  interpret  mass 
spectrograms. 

TheRl  program  configures  DigitalVAX  computers  from  customer  orders  (McDermott, 
1982).  It  has  extensive  knowledge  of  the  structural  components  ofVAXes,  and  of  the  prop- 
erties of  these  components  such  as  their  power  consumption  and  the  ways  in  which  they 
may  be  interconnected. 

In  certain  domains,  a  structure-function  model  is  not  available.  For  example,  in  the 
medical  domain  of  theMYCIN  program,  the  pathways  of  most  of  the  processes  are  unknown 
to  medical  science.  Even  simple  knowledge,  such  as  the  mechanism  by  which  an  infection 
causes  a  fever,  is  unknown.    Hence  the  reasoning  ofMYCIN  and  physicians  in  the  domain 
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of  meningitis  proceeds  from  heuristic  associational  knowledge  of  the  behaviors  (symptoms) 
that  result  from  different  structures  (the  human  body  combined  with  different  bacteria). 

It  seems  very  plausible  that  there  exist  problems  whose  solutions  require  significantly 
more  knowledge  of  structure  and  function  than  existing  expert  systems  have.  But  it  is  not 
true  that  these  systems  do  not  use  knowledge  of  structure  or  function. 


3.5      Amount  of  Knowledge 

As  stated  earlier,  our  central  thesis  is  that  the  desired  behavioral  requirements  for  future 
expert  systems  will  be  realized  by  providing  them  with  larger  amounts  of  knowledge  rather 
than  new  types  of  knowledge  which  they  do  not  currently  employ.  Techniques  for  managing 
and  utilizing  this  knowledge  will  also  be  important  as  we  discuss  in  later  sections  of  this 
paper. 

The  following  test  has  been  proposed  by  which  one  may  decide  whether  one  program 
is  deeper  than  another. 


Consider  two  models  of  expertise  M  and  M  .  We  will  say  that  M  is  deeper- 
than  M  if  there  exists  some  implicit  knowledge  in  M  which  is  explicitly  repre- 
sented or  computed  in  M  "  (Klein  and  Finin,  1987). 


The  authors  of  this  test  acknowledge  that  notions  such  as  implicit  and  explicit  knowl- 
edge are  left  to  intuition,  so  it  can  be  difficult  to  apply  this  predicate  in  many  cases. 

The  reason  to  build  programs  that  contain  more  knowledge  is  so  they  will  solve  prob- 
lems whose  solution  requires  more  knowledge.  Consideration  of  the  amount  of  knowledge 
necessary  to  solve  problems  in  different  domains  provides  an  explanation  of  both  the  capa- 
bilities and  limitations  of  existing  expert  systems.  The  behavior  of  many  expert  systems  is 
described  by  the  graph  in  Figure  1.  It  suggests  how  the  competence  of  a  program  changes 
over  time  as  it  contains  more  and  more  knowledge.  This  accurately  characterizes  the  experi- 
ence of  adding  more  and  more  rules  forRl  program  via  manual  knowledge  acquisition  when 
failures  were  encountered  (McDermott,  1982).  A  similar  analysis  is  given  by  Feigenbaum 
and  Lenat  (Feigenbaum  and  Lenat,  1987). 

Two  properties  of  this  curve  are  important:  the  fact  that  it  increases  monotonically, 
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Figure  1:  This  figure  shows  the  relationship  between  the  size  of  an  expert 
system's  knowledge  base  and  the  sophistication  of  the  problem-solving 
performance. 

and  the  fact  that  its  derivative  (the  marginal  utility  of  knowledge)  starts  out  small,  grows 
significantly  larger,  and  then  decreases  again.  The  behavior  of  the  curves  derivative  appears 
to  be  do  to  the  existence  of  a  relatively  small  "core  set"  of  knowledge  which  is  required  to 
solve  a  relatively  large  set  of  problems  in  the  domain,  with  larger  amounts  of  more  esoteric 
knowledge  required  to  round  out  the  system.  Between  points  A  and  B,  new  knowledge 
interacts  synergistically  with  old  to  increase  competence  tremendously.  But  after  point 
J9,  the  marginal  utility  of  knowledge  begins  to  decrease,  and  most  core  knowledge  for  the 
domain  has  been  captured  by  the  time  point  C  is  reached. 

Different  problem  domains  would  exhibit  curves  with  the  same  shape,  but  with  dif- 
ferent scaling  along  the  axes,  e.g.,  a  very  complex  domain  would  scale  the  curve  outward 
to  the  right,  requiring  more  knowledge  to  exhibit  the  same  degree  of  competence.  Imagine 
building  expert  systems  of  roughly  the  same  size  in  many  different  domains.  Their  perfor- 
mance would  vary  according  to  the  scale  of  the  curve  in  that  domain.  For  example, MYCIN 
probably  lies  near  point  C:  its  validation  studies  have  shown  that  it  is  able  to  solve  many 
problems  within  its  domain.  After  a  system  has  reached  this  level  of  competence,  for  each 
new  case  that  is  unable  to  be  solved,  more  and  more  rules  will  be  added  to  the  knowledge 
base  that  are  each  relevant  to  fewer  and  fewer  cases.  But  if  a  system  of  roughly  MYCIN'S  size 
is  to  be  built  in  a  much  more  complex  domain,  the  system  would  be  closer  to  point  A,  where 
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it  exhibits  poor  performance,  but  a  high  marginal  utility  of  knowledge.  This  situation  has 
often  led  various  authors  to  hypothesize  the  explanations  for  the  shortcomings  of  existing 
expert  systems  such  as  those  discussed  earlier  in  this  paper.  It  may  be  that  the  limitations 
of  these  systems  are  due  not  to  the  types  of  knowledge  the  systems  employ,  but  to  the 
complexity  of  the  different  problem  domains:  different  domains  require  different  amounts 
of  knowledge  for  a  given  level  of  problem-solving  competence. 

Figure  1  is  of  course  a  simplification.  The  curve  might  start  out  completely  flat 
because  some  critical  mass  is  needed  before  any  problems  can  be  solved.  And  the  curve 
might  actually  decrease  at  the  right  because  the  program  is  unable  to  properly  employ  a 
huge  knowledge  base.  Curves  of  slightly  different  shapes  may  result  from  adding  different 
pieces  of  knowledge  in  variable  order. 


3.6      Production  Rules 

Production  rules  have  been  described  as  an  inappropriate  basis  for  constructing  programs 
that  exhibit  a  number  of  the  behaviors  that  have  been  previously  described  (Davis,  1984; 
Genesereth,  1984). 

First,  let  us  clarify  this  hypothesis.  Certainly  the  claim  is  not  being  made  that 
production  rules  cannot  in  principle  be  used  to  achieve  these  behaviors.  Since  production 
rules  have  been  proved  Turing- equivalent  (Brainerd  and  Landweber,  1974)  this  claim  would 
imply  that  the  behavioral  goal  could  never  be  met  by  a  computer  program,  so  instead,  the 
claim  must  be  that  production  rules  are  not  a  good  engineering  tool  for  this  task.  Claims  of 
this  form  are  certainly  reasonable;  such  concerns  motivate  the  design  of  new  programming 
languages,  for  example.  However,  most  authors  never  make  clear  what  the  engineering 
limitations  of  production  rules  are.  Thus  it  is  difficult  to  comprehend  their  arguments  and 
to  see  where  future  research  should  be  directed. 

The  quotations  in  Section  3.1  by  Davis  and  Genesereth  discuss  the  degree  to  which 
production  rules  are  suited  to  representing  the  structure  and  behavior  of  digital  circuits. 
Genesereth  (Genesereth,  1984)  contrasts  production  rule  representations  used  by  expert 
systems  with  representations  used  by  theDART  program  such  as  the  description  of  an  AND 
gate  shown  below: 

(If      (AND    (ANDG  d) 

(VAL   (IB   1   d)   t   ON) 
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(VAL   (IN  2  d)    t   ON)) 
Then   (VAL   (OUT   1   d)    t   ON))) 

(If      (AND   (ANDG  d) 

(VAL   (IN   1  d)    t   OFF)) 
Then   (VAL   (OUT   1   d)    t   OFF))) 

(If      (AND   (ANDG  d) 

(VAL   (IN  2  d)   t   OFF)) 
Then   (VAL   (OUT  1   d)   t    OFF))) 


But  Genesereth's  description  is  in  fact  a  set  of  production  rales  which  represent  the 
behavior  of  anAND  gate.  These  rules  directly  associate  the  possible  inputs  of  anAND  gate 
with  the  outputs  it  would  generate.  Thus  production  rules  are  able  to  encode  structural 
and  functional  knowledge  about  a  device,  contrary  to  Davis  and  Genesereth's  assertions. 

De  Kleer  and  Brown  (DeKleer  and  Brown,  1984)  hold  the  view  that  constraints  should 
be  used  to  express  device  behavior  instead  of  production  rules.  They  offer  three  arguments 
in  support  of  this  view. 

The  first  argument  is  that  constraints  can  express  behavior  more  succinctly  than 
production  rules.  Imagine  that  we  wish  to  describe  the  behavior  of  a  pipe  by  stating  that 
its  input  pressure  must  match  its  output  pressure.  DeKleer  (DeKleer  and  Brown,  1984) 
states  that  within  the  {-1,  0,  +1}  value  space  used  to  qualitatively  represent  numerical 
processes,  this  could  be  done  with  one  constraint: 


Pin  =  P< 


out 


but  this  same  expression  would  require  the  use  of  six  production  rules: 


(IF   (EQUAL  Pin   -1)   THEN   (SET  P^t    -1)) 
(IF   (EQUAL  Pin   0)   THEN   (SET  P^t   0)) 
(IF   (EQUAL  Pin   1)   THEN   (SET  P^    1)) 

(IF   (EQUAL  P^t   -1)   THEN    (SET  Pin   -1)) 
(IF   (EQUAL  Pwt   0)   THEN   (SET  Pin   0)) 
(IF   (EQUAL  P^t    1)   THEN    (SET  Pin    1)) 
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Now  it  is  fairly  trivial  to  condense  this  set  of  rules  by  the  use  of  variables  to: 

(IF   (EQUAL  PtnU)   THEN    (SET  P^U)) 
(IF   (EQUAL  PwtU)   THEN    (SET  Pin   $X)) 

The  constraint  is  still  slightly  more  concise  here,  because  constraints  are  bidirectional, 
whereas  rules  are  unidirectional.  While  rules  can  be  invoked  within  both  a  forward  chaining 
and  a  backward  chaining  interpreter,  a  valid  inference  is  assumed  to  occur  only  in  the 
forward  direction;  backward  chaining  can  create  only  a  goal,  not  a  conclusion.  Put  another 
way,  the  rule  A  D  B  allows  a  system  to  conclude  that  B  is  true  when  it  knows  A  is  true,  but 
not  that  A  is  false  when  B  is  known  to  be  false.  However,  constraints  will  cause  problems 
when  we  wish  to  express  unidirectional  relations,  or  when  we  have  different  degrees  of  belief 
in  inferences  in  the  two  directions.  So  the  following  constraint  is  too  expressive  (we  believe 
the  forward  but  not  necessarily  the  backward  inference): 

light-ssitch-off  =  light-bulb-off 

The  second  argument  offered  in  favor  of  constraints  in  (DeKleer  and  Brown,  1984) 
is  that  the  if-then  form  of  production  rules  "falsely  implies  the  passage  of  time"  between 
the  test  and  action.  Perhaps  this  point  depends  on  exactly  who  is  reading  a  given  rule;  it 
is  not  clear  which  interpretation  is  the  correct  one.  But  symmetrically,  perhaps  constraints 
falsely  imply  instantaneous  linkages  within  a  device  where  in  fact  it  takes  time  for  changes 
to  propagate.  In  an  event,  it  is  quite  possible  to  modify  production  rules  to  include  a  time 
parameter  which  allows  one  to  explicitly  indicate  whether  or  not  time  is  passing: 

IF   (EQUAL  Pinn  TO   THEN   (SET  P^tU  T1+€) 


The  third  argument  is  more  an  indictment  of  the  technique  of  forward  simulation  than 
of  production  rules  per  se.  DeKleer  and  Brown  describe  an  example  in  which  two  pipes  are 
connected,  with  the  pressure  rising  at  one  end  and  held  constant  at  the  other  end.  They 
find  that  if  one  uses  production  rules  to  represent  local  information  relating  pressure  and 
flow,  and  then  reasons  in  a  traditional  forward  direction  to  try  to  derive  the  pressure  at  the 
joint  between  the  pipes,  this  solution  method  fails.  The  reason  given  is  that  this  problem 
cannot  be  solved  by  simply  propagating  values  through  the  constraints  which  describe  these 
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pipes  (i.e.,  by  forward  chaining  through  production  rules).  The  pressure  at  the  joint  must 
be  derived  by  assuming  every  possible  value  for  this  pressure,  and  then  determining  if  any 
constraints  are  violated.  The  assumed  pressure  which  does  not  yield  a  contradiction  is  the 
solution.  They  conclude  that:  "constraints  can  support  both  imperative  interpretations 
(they  can  be  executed)  and  assertional  interpretations  (i.e.,  they  can  be  reasoned  over)". 

This  limitation  is  hardly  related  to  the  use  of  production  rules.  Production  rules  can 
be  both  executed  and  reasoned  over  just  as  constraints  can  be.  If  DeKleer  and  Brown  had 
attempted  to  utilize  constraints  in  a  forward  qualitative  simulation  they  would  have  just  the 
same  problem  as  with  production  rules.  It  is  common  for  AI  programs  to  manipulate  a  set 
of  production  rules  in  several  ways:  MYCIN'S  rules  were  used  for  diagnosis,  for  explanation, 
and  for  interactive  transfer  of  expertise  in  theTEIRESIAS  program  (Davis,  1982). 

Another  possibility  is  that  these  authors  are  criticizing  the  use  of  production  systems 
rather  than  production  rules,  i.e.,  the  use  of  a  fixed  forward  or  backward  chaining  interpreter 
to  evaluate  these  rules.  Davis  and  Genesereth  use  techniques  called  Constraint  Suspension 
and  Resolution  Residue,  respectively,  in  their  circuit  diagnosis  programs.  It  is  reasonable  to 
assert  that  different  tasks  (e.g.,  diagnosis  versus  prediction)  will  require  different  reasoning 
mechanisms.  But  as  discussed  in  the  previous  paragraph,  single  sets  of  production  rules 
have  been  used  by  multiple  reasoners  since  the  days  ofMYCIN. 

Thus,  we  see  that  production  rules  do  not  have  a  number  of  the  limitations  which  au- 
thors have  claimed.  Previous  statements  by  Davis,  Hart  and  Genesereth  have  also  suggested 
that  production  rules  are  unable  to  represent  causal  knowledge,  knowledge  of  structure  and 
function,  or  first  principles.  We  have  shown  in  earlier  sections  that  systems  such  as  MYCIN 
andDENDRAL,  which  are  built  from  production  rules,  do  contain  this  type  of  knowledge. 


3.7     Multi-Level  Domain  Models 

Davis  examines  the  reasoning  a  human  expert  might  use  to  track  down  a  fault  in  a  computer 
system  (Davis,  1984)  (page  14-19).  In  this  example  the  expert  is  given  a  gross  description 
of  the  machine's  functionality:  the  system  boots  and  responds  properly  at  the  console,  but 
user  terminals  are  dead.  On  this  basis  the  expert  rules  out  a  number  of  the  machine's 
components  at  the  Processor/Memory/Switch  (PMS)  level:  the  CPU,  the  system  disk,  and 
the  bus  between  them.  Davis  then  focuses  on  the  terminal  bus  and  the  terminals  themselves 
and  rules  out  the  latter.  He  probes  the  internal  state  of  the  bus  the  terminals  are  connected 
to,  focusing  on  increasingly  detailed  descriptions  of  the  hardware.  Eventually  he  is  able  to 
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determine  that  the  bus  interface  is  dropping  a  bit. 

This  example  involves  reasoning  about  the  internal  structure  of  complex  devices  at 
several  levels  of  detail.  The  descriptions  at  each  level  represent  the  structure  and  function 
of  the  device  with  different  degrees  of  abstraction.  Each  description  is  fairly  independent  of 
the  others  in  that  it  can  accept  diagnostic  information  that  is  compatible  with  its  level  of 
abstraction,  and  compute  diagnostic  hypotheses  and  tests  at  that  level  of  abstraction.  But 
some  connections  between  the  descriptions  must  exist  as  well,  to  allow  a  global  diagnostic 
procedure  to  home  in  on  a  precise  diagnosis  using  more  and  more  detailed  descriptions  in 
more  and  more  restricted  regions  of  the  device.  We  call  such  descriptions  multi-level  domain 
models1 . 

This  organization  of  knowledge  is  a  very  fruitful  avenue  for  future  research  because 
it  appears  to  be  relevant  to  producing  a  number  of  the  desired  behaviors.  One  advantage 
of  this  approach  is  faster  problem  solving.  A  system  may  be  able  to  achieve  satisfactory 
performance  if  it  reasons  at  an  abstract  level  for  problems  whose  solution  does  not  require  all 
the  system's  knowledge.  Or  it  could  time-share  problem-solving  at  several  levels  of  detail 
simultaneously  to  produce  the  most  detailed  solution  possible  within  unpredictable  time 
constraints. 

Another  possibility  provided  by  the  framework  is  validation  of  the  scope  of  problem 
solving.  Perhaps  abstract  levels  of  knowledge  could  be  used  for  problem  solving,  while 
underlying  levels  could  be  used  to  determine  if  the  abstract  levels  are  applicable  to  a  given 
problem.  An  underlying  theory  consisting  of  Newton's  laws  might  contain  qualifications 
such  as:  only  applicable  to  objects  moving  slowly  compared  to  the  speed  of  light,  and 
in  weak  gravitational  fields.  These  laws  might  be  compiled  to  produce  rules  for  solving 
problems  involving  bouncing  balls.  But  the  qualifications  at  the  underlying  levels  could  be 
checked  before  application  of  the  compiled  level.  Yet  another  advantage  of  multiple  levels 
relates  to  the  production  of  explanations.  Abstract  levels  of  a  multi-level  domain  model 
could  be  used  to  generate  concise,  general  explanations,  while  lower  levels  could  generate 
detailed,  focused  explanations. 

Several  first  generation  expert  systems  used  descriptions  with  only  some  of  the  prop- 
erties above.  Their  domain  models  were  structured  into  descriptions  at  several  levels  of 
detail.  But,  they  were  not  distinct  in  the  sense  described  above  because  generally  aD  levels 
of  detail  had  to  be  employed  when  solving  a  given  problem.  For  example,  the  HEARSAY 
speech  understanding  system  (Erman  et  al.,  1980)  did  model  speech  understanding  at  sev- 


These  are  to  be  distinguished  from  systems  that  contain  a  level  of  domain  knowledge  plus  one  or  more 
levels  of  control  knowledge. 
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eral  different  levels  of  detail.  But  the  input  signal  always  arrived  at  the  lowest  level  of  detail 
and  problem  solving  propagated  the  answer  up  through  the  system  to  produce  outputs  at 
the  sentence  level  (in  fact  processing  was  not  strictly  bottom-up).  Similarly,  PatiPsABEL 
system  contained  multiple  descriptions  at  different  levels  of  abstraction  in  the  medical  do- 
main of  electrolyte  disorders  (Patil  et  al.,  1981),  and  theHELIOS  digital  circuit  simulator 
models  digital  circuits  at  several  interlocking  levels  of  detail  (Brown  et  al.,  1983;  Foyster, 
1984). 

A  promising  approach  relates  to  deriving  a  shallow  reasoning  system  from  a  deep 
causal  model,  such  as  described  in  (Pearce,  1988).  This  shallow  reasoning  system  is  superior 
in  terms  of  time  and  space  complexity  measures,  and  can  also  be  guaranteed  to  be  complete. 
The  completeness  of  the  model  is  of  course  with  respect  to  the  causal  model,  which  may  or 
may  not  accurately  mirror  the  real  world. 

In  summary,  while  first  generation  expert  systems  include  descriptions  with  some  of 
the  attributes  of  multi-level  domain  models,  further  research  is  required  to  endow  future 
systems  with  the  behaviors  we  have  here  described. 


3.8      Reasoning  about  Real- Valued  State  Variables 

There  is  a  large  class  of  problems  whose  solutions  require  detailed  reasoning  about  the 
relationships  between  real- valued  state  variables  of  a  system.  For  example,  when  one  con- 
structs a  generative  model  of  a  device,  one  may  need  to  combine  the  pressures,  concentra- 
tions, temperatures,  masses,  accelerations,  and/or  voltages  of  its  components  to  compute 
the  aggregate  behavior  of  the  device.  Human  experts  are  often  able  to  reason  about  such 
dimensions  of  a  system  when  only  incomplete  information  is  available  about  the  values  of 
these  variables  and  the  relationships  among  them. 

Authors  of  existing  expert  systems  have  largely  side-stepped  these  issues  by  tackling 
classes  of  problems  where  this  type  of  reasoning  is  either  not  required,  or  where  very  simple 
types  of  reasoning  will  suffice,  e.g.,  inMYCIN's  domain. 

In  the  past  few  years,  researchers  in  the  Al  subfield  of  qualitative  reasoning  have 
developed  new  techniques  for  reasoning  about  complex  interactions  between  real-valued 
state  variables  in  the  presence  of  incomplete  information  and  in  the  presence  of  complicating 
factors  such  as  feedback.  New  qualitative  representations  have  been  developed  for  both 
the  values  of  state  variables  and  for  the  interactions  between  them  and  for  predicting  the 
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potential  behaviors  of  such  a  system  (Forbus,  1984;  Karp  and  Friedland,  1988;  DeKleer  and 
Brown,  1984;  Kuipers,  1985;  Kuipers,  1986;  Simmons,  1986).  Significant  progress  has  been 
made  in  this  field,  but  it  is  not  possible  to  construct  programs  whose  understanding  of  a 
complex  device  approaches  that  of  a  human  expert. 


3.9      Explicit  Representation  of  Control  Knowledge 

Another  technique  which  has  been  explored  in  the  development  of  expert  systems  is  the 
separation  of  the  program's  control  knowledge  from  its  domain  knowledge,  and  the  explicit 
representation  of  this  control  knowledge.  This  was  actually  a  major  goal  in  the  construction 
ofMYCIN,  wherein  the  expert  system  was  viewed  as  consisting  of  a  domain  knowledge  base 
and  an  inference  engine  that  accomplished  control  by  backward  chaining  of  production  rules. 

Important  efforts  that  have  investigated  the  separation  of  control  knowledge  from 
domain  knowledge  include  the  NEOMYCIN  program,  theBBl  blackboard  architecture,  and 
the  MRS  logic  programming  language.  NEOMYCIN  separates  knowledge  of  medical  domain 
knowledge  of  meningitis  infections  from  strategy  knowledge  for  performing  medical  diagnosis 
(Clancey,  1984).  BBl  allows  a  programmer  to  separate  problem  solving  operators  in  a  domain 
from  control  strategies  and  heuristics  that  guide  the  application  of  these  operators  (Hayes- 
Roth,  1985).  MRS  provides  metalevel  facilities  to  control  inference  in  a  Prolog-like  logic 
programming  language  (Russell,  1985). 

There  are  many  advantages  to  a  clean  separation  of  domain  and  control  knowledge. 
First,  it  facilitates  using  the  same  knowledge  for  different  purposes,  such  as  diagnosis,  design, 
teaching,  and  explanation.  Separate  inference  procedures  can  be  constructed  for  each  of 
these  tasks,  and  each  of  them  can  use  the  same  domain  knowledge  base.  Second,  an  explicit 
representation  of  the  control  knowledge  facilitates  the  construction  of  programs  that  can 
inspect  and  reason  about  the  control  knowledge.  MRS  allows  dynamic  determination  of 
the  order  in  which  clauses  of  a  conjunctive  rule  are  executed,  using  its  metalevel  control 
facilities.  Third,  it  simplifies  the  problem  of  knowledge  acquisition.  When  a  knowledge 
acquisition  program  modifies  the  knowledge  structures  of  a  program,  there  is  less  chance 
that  the  modifications  will  have  unforeseen  side  effects.  Fourth,  it  can  lead  to  cleaner,  more 
structured  explanations  that  differentiate  between  the  domain  operators  that  are  currently 
being  applied,  and  the  control  strategy  that  has  caused  them  to  be  applied  at  a  particular 
time.  Lastly,  explicit  representation  of  control  knowledge  facilitates  optimizing  control  for 
a  specific  task.  This  produces  faster  performance  in  some  cases. 
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A  complete  separation  of  control  and  domain  knowledge,  and  the  explicit  represen- 
tation of  control  knowledge,  can  be  viewed  as  long-term  research  goals.  The  NEOMYCIN, 
BBl,  andMRS  systems  represent  significant  steps  in  the  achievement  of  this  goal.  However, 
part  of  the  lesson  from  constructing  these  systems  has  been  the  discovery  of  ways  in  which 
there  is  an  incomplete  separation  of  domain  and  control  knowledge,  and  a  non-explicit 
representation  of  control  knowledge. 


3.10      Compilation 

Within  computer  science,  the  term  "compilation"  usually  refers  to  transforming  the  repre- 
sentation of  a  program  to  increase  its  efficiency  or  to  make  the  program  operational  on  a 
particular  hardware  architecture.  Within  artificial  intelligence,  compilation  also  refers  to 
the  process  of  transforming  a  program  to  increase  its  efficiency,  although  often  the  pro- 
gram is  to  be  optimized  for  a  particular  class  of  problems.  TheLEX  program,  for  example, 
transforms  its  control  knowledge  (which  is  represented  as  preconditions  for  integration  op- 
erators) in  the  process  of  solving  symbolic  integration  problems.  Another  good  example 
is  the  SOAR  program  in  which  the  principal  learning  method  is  chunking  preconditions  for 
problem-solving  operators  in  the  course  of  solving  problems  (Laird  et  al.,  1987). 

The  principal  method  of  obtaining  a  compiled  domain  model  for  an  expert  system  has 
been  to  extract  it  from  an  human  expert.  This  is  the  method  that  was  used  to  obtain  the 
compiled  domain  model  that  is  used  by  theMYCIN  program.  Another  approach  to  obtaining 
a  compiled  domain  model  is  to  extract  a  deep  domain  model  from  a  human  expert,  and  then 
compile  this  deep  domain  model  into  a  shallow  domain  model.  This  has  two  advantages. 
The  first  advantage  is  that  a  deep  domain  model  can  often  be  created  from  books  or  from 
documentation  of  a  domain  by  someone  who  is  not  a  domain  expert.  Another  advantage 
is  that  a  single  deep  domain  model  can  sometimes  be  compiled  into  many  different  shallow 
domain  models,  where  each  different  shallow  domain  model  is  created  for  a  different  class 
of  problems,  such  as  design  and  diagnosis. 

Compilation  almost  always  yields  an  improvement  in  the  space-time  complexity  of  a 
program  over  the  uncompiled  version  of  the  program.  Another  potential  advantage  with 
respect  to  expert  systems  is  shorter  explanations,  since  the  information  is  compressed.  Note 
that  information  is  lost  during  the  process  of  compilation  of  a  deep  domain  model  into  a 
shallow  one,  and  so  the  process  is  not  reversible. 

There  have  been  quite  a  few  research  efforts  in  the  direction  of  creating  a  shallow 
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domain  model  from  a  deep  domain  model.  The  usual  approach  is  to  synthesize  a  functional 
model  of  a  device  into  an  associational  model,  for  purposes  of  diagnostic  problem  solving 
(Chandrasekaran  and  Mittal,  1983). 

One  of  the  advantages  that  is  often  claimed  for  deep  domain  models  is  that  they 
provide  a  fallback  when  a  shallow  domain  model  proves  inadequate.  However,  a  shallow 
model  may  be  created  from  a  deep  model  in  such  a  way  that  this  will  not  be  true,  as  described 
in  the  following  conjecture  (Chandrasekaran  and  Mittal,  1983).  They  hypothesize  that  it 
may  be  possible  to  compile  a  shallow  domain  model  for  a  particular  goal  from  a  deep  domain 
model,  such  that  all  the  problems  that  the  deep  model  is  able  to  solve  can  be  solved  by  the 
shallow  model. 


Between  the  extremes  of  a  data  base  of  patterns  on  one  hand  and  repre- 
sentations of  deep  knowledge  (in  whatever  form)  on  the  other,  there  exists  a 
knowledge  and  problem-solving  structure  which  (1)  has  all  the  relevant  deep 
knowledge  "compiled"  into  it  in  such  a  way  that  it  can  handle  all  the  diagnostic 
problems  that  the  deep  knowledge  is  supposed  to  handle  if  it  is  explicitly  repre- 
sented and  used  in  problem-solving;  and  (2)  will  solve  the  diagnostic  problems 
more  efficiently;  but  (3)  it  cannot  solve  other  types  of  problems  -  i.e.  prob- 
lems' which  are  not  diagnostic  in  nature  -  that  the  deep  knowledge  structure 
potentially  could  handle. 


Even  if  a  shallow  domain  model  that  is  derived  from  a  deep  domain  model  can  solve  all 
the  problems  that  can  be  solved  by  the  deep  model,  the  complexity  of  the  resulting  shallow 
model  may  make  this  approach  not  worthwhile.  In  such  a  case,  the  correct  approach  would 
be  to  compile  a  shallow  model  that  would  give  fast  solutions  to  most  problems,  and  use  the 
deep  model  for  those  cases  in  which  the  shallow  domain  model  is  inadequate. 

The  concept  of  encapsulation  provides  a  nice  distinction  to  elucidate  the  difference 
between  compiled  and  deep  knowledge  (Simmons,  1988).  Compiled  knowledge  encapsu- 
lates the  interactions  that  occur  between  knowledge  elements;  elements  that  interact  are 
represented  as  different  clauses  in  the  same  compiled  rule. 

There  are  many  open  questions  that  relate  to  compilation.  For  example,  given  a 
change  to  a  deep  domain  model,  can  the  shallow  domain  model  be  incrementally  changed? 
There  is  a  lot  of  work  that  relates  to  quantifying  the  space- time  complexity  advantages 
of  compilation.    For  example,  might  it  be  worthwhile  to  have  several  different  compiled 
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versions,  and  always  use  the  most  efficient  version  that  is  sufficiently  detailed  for  the  current 
problem? 
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4      Summary 


The  first  generation  of  expert  systems  (1972-1981)  is  often  described  as  using  only  shallow 
methods  of  representation  and  inference.  These  expert  systems  are  then  dismissed  on  the 
grounds  that  use  of  these  methods  prevents  them  from  achieving  problem-solving  behav- 
iors that  expert  systems  should  possess.  This  paper  analyzed  the  dependencies  between 
behaviors  and  techniques  to  determine  what  existing  techniques  are  limiting  the  behaviors 
of  expert  systems,  and  what  new  techniques  are  likely  to  extend  their  capabilities.  This 
analysis  conflicted  with  those  of  other  authors  in  several  ways.  Some  techniques  and  be- 
haviors that  others  have  discussed  were  poorly  denned;  in  other  cases  the  links  between 
techniques  and  behaviors  were  either  not  specified  or  not  well  justified.  And  some  proposed 
new  techniques  for  future  systems  have  in  fact  been  used  by  previous  systems. 

The  past  inaccurate  characterizations  of  first-generation  expert  are  surprisingly  widespread, 
which  has  had  two  deleterious  effects.  First,  so-called  deep  methods  are  often  improperly 
viewed  as  an  an  alternative  to  shallow  methods,  rather  than  as  an  augmentation.  Second, 
there  is  an  overemphasis  on  developing  new  methods  to  address  the  behavioral  shortcomings 
of  expert  systems,  as  opposed  to  investigating  existing  techniques  in  more  detail  to  extend 
their  power. 

This  paper  first  discussed  a  set  of  behaviors  which  are  important  for  expert  systems 
to  exhibit.  The  behaviors  discussed  were  the  following: 

Expert  systems  must  be  able  to  explain  their  problem  solving  behavior.  These  expla- 
nations should  be  tailored  to  what  the  user  knows  so  that  they  are  neither  too  detailed  nor 
too  abstract. 

The  problem-solving  performance  of  an  expert  system  should  degrade  gracefully  as 
the  problems  presented  to  the  system  depart  from  the  system's  domain  of  expertise.  While 
early  expert  systems  such  as  MYCIN  were  in  fact  able  to  detect  that  some  problems  were 
outside  their  expertise,  very  little  research  has  been  done  since  to  determine  how  difficult 
such  limit  detection  is  in  general. 

As  the  functionality  of  expert  systems  has  increased,  their  speed  has  often  decreased. 
Future  systems  must  not  only  solve  complex  problems,  they  must  solve  them  within  rea- 
sonable resource  limitations. 

Some  authors  have  suggested  that  the  first  generation  of  expert  systems  are  unable 
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to  solve  novel  problems,  but  that  future  systems  should  have  this  capability.  Our  analysis 
shows  that  it  is  hard  to  define  what  novelty  is,  but  one  can  consider  novel  problems  as 
those  whose  solution  the  programmer  has  not  anticipated.  We  concluded  that  whether  the 
programmer  constructs  a  system  using  generative  ox  enumerative  candidate  descriptions, 
he  or  she  must  always  anticipate  what  classes  of  problems  the  system  will  encounter. 

To  solve  some  classes  of  problems,  the  problem  solver  must  reason  about  real-valued 
state  variables  such  as  pressures  and  voltages.  Standard  numerical  techniques  are  not  ap- 
propriate when  only  imprecise,  qualitative  values  are  available  for  these  variables,  which  is 
sometimes  the  case  for  problems  that  expert  systems  should  solve. 

Just  as  experts  can  employ  their  knowledge  for  multiple  purposes,  it  is  desirable  for 
expert  systems  to  use  their  knowledge  in  many  ways  such  as  diagnosis,  teaching,  and  the 
acquisition  of  additional  knowledge.  Knowledge  representation  methods  are  an  important 
key  for  this  ability,  and  those  used  by  NEOMYCIN  and  associated  programs  allowed  a  single 
knowledge  base  to  be  used  for  the  multiple  purposes  listed  above. 

Finally,  some  authors  have  suggested  that  future  expert  systems  will  solve  more 
complex  problems  than  first  generation  systems.  These  suggestions  do  not  identify  what 
techniques  are  limiting  the  complexity  of  problems  that  existing  systems  can  solve,  and  in 
general  it  is  difficult  to  be  precise  about  this  issue  because  the  field  has  few  methods  for 
characterizing  the  complexity  of  a  problem  domain. 

The  next  section  of  the  paper  considered  various  techniques  for  constructing  expert 
systems  that  might  endow  them  with  the  behaviors  above.  First,  the  use  of  certain  types  of 
knowledge  in  future  systems  was  considered.  Several  authors  have  suggested  that  the  first 
generation  of  expert  systems  lack  these  types  of  knowledge,  but  that  future  systems  will 
require  them. 

Consideration  of  causal  knowledge  revealed  that  it  is  not  at  all  clear  just  how  one  dis- 
tinguishes it  from  other  types  of  knowledge;  philosophers  present  many  possible  definitions 
of  causality.  But,  if  we  proceed  from  an  intuitive  notion  of  causality,  it  is  clear  that  early 
expert  systems  such  asDENDRAL  did  possess  causal  knowledge. 

Similar  remarks  apply  to  structure-function  knowledge  of  a  device  (although  these  are 
easier  to  define):  many  early  systems  did  have  this  sort  of  knowledge. 

Knowledge  of  first  principles  was  considered  next.  We  showed  that  the  characteristics 
of  first  principles  knowledge  varied  tremendously  between  application  domains.  Thus,  we 
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could  only  operationalize  the  method  of  "falling  back  on  first  principles"  as  advice  to  bring 
a  second  problem  solving  method  to  bear  on  a  problem.  So  first  principles  are  not  actuaUy 
a  type  of  knowledge,  but  merely  an  additional  source  of  knowledge  whose  properties  vary  in 
different  domains. 

The  next  section  considered  the  use  of  empirical  associations,  which  other  authors 
have  suggested  should  not  be  used  in  future  systems.  "Empirical  associations"  proved 
to  be  yet  another  fuzzy  term,  and  analysis  showed  that  it  is  hard  to  imagine  building 
systems  where  there  is  no  important  role  for  empirical  knowledge,  especially  since  all  of  our 
theoretical  knowledge  ultimately  rests  on  empirical  experience. 

Next  we  considered  various  ways  of  structuring  knowledge  in  expert  systems. 

Several  criticisms  of  production  rules  were  considered  and  found  to  have  little  sub- 
stance. Just  as  constraints  (a  proposed  alternative)  can  be  reasoned  over  for  several  pur- 
poses, so  too  can  production  rules.  Production  rules  are  relevant  to  almost  all  behaviors 
of  expert  systems  since  they  provide  fundamental  reasoning  and  representational  building 
blocks. 

Multi-level  domain  models  are  relatively  independent  models  of  a  domain  at  differ- 
ent levels  of  abstraction.  They  seem  relevant  to  providing  better  explanations,  increased 
problem  solving  speed,  and  more  sophisticated  problem  solving.  Further  research  will  be 
required  to  provide  expert  systems  with  such  models. 

Qualitative  reasoning  about  real-valued  state  variables  is  a  relatively  new  area  of  re- 
search; previous  expert  systems  had  no  techniques  to  solve  problems  with  qualitative  knowl- 
edge of  real- valued  state  variables. 

Separation  of  domain  from  control  knowledge  has  improved  explanations,  problem 
solving  speed,  and  multi-purpose  problem  solving  in  existing  expert  systems.  It  is  likely  to 
be  important  in  future  systems  as  well. 

Finally,  including  larger  amounts  of  knowledge  in  future  expert  systems  will  likely  be 
a  key  method  for  improving  both  their  problem  solving  sophistication  and  the  grace  with 
which  their  performance  degrades  on  peripheral  problems.  But  determining  how  to  structure 
large  quantities  of  knowledge  for  fast  use,  understandable  explanations,  and  the  solution  of 
multiple  types  of  problems  will  present  significant  challenges  which  future  researchers  must 
address. 
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